Distance, dissimilarity index, and network community structure.

نویسنده

  • Haijun Zhou
چکیده

We address the question of finding the community structure of a complex network. In an earlier effort [H. Zhou, Phys. Rev. E 67, 041908 (2003)], the concept of network random walking is introduced and a distance measure defined. Here we calculate, based on this distance measure, the dissimilarity index between nearest-neighboring vertices of a network and design an algorithm to partition these vertices into communities that are hierarchically organized. Each community is characterized by an upper and a lower dissimilarity threshold. The algorithm is applied to several artificial and real-world networks, and excellent results are obtained. In the case of artificially generated random modular networks, this method outperforms the algorithm based on the concept of edge betweenness centrality. For yeast's protein-protein interaction network, we are able to identify many clusters that have well defined biological functions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

یادگیری نیمه نظارتی کرنل مرکب با استفاده از تکنیک‌های یادگیری معیار فاصله

Distance metric has a key role in many machine learning and computer vision algorithms so that choosing an appropriate distance metric has a direct effect on the performance of such algorithms. Recently, distance metric learning using labeled data or other available supervisory information has become a very active research area in machine learning applications. Studies in this area have shown t...

متن کامل

Detecting Community Structure in Complex Networks Using Bacterial Chemotaxis with Fuzzy C-means Clustering

Identification of (overlapping) communities/clusters in a complex network is a general problem in data mining of network data sets. In this paper, the bacterial chemotaxis (BC) strategy is used to maximize the modularity of a network, associating with a dissimilarity-index-based and with a diffusion-distance-based fuzzy c-means clustering iterative procedure. The proposed algorithm outperforms ...

متن کامل

خوشه‌بندی داده‌های بیان‌ژنی توسط عدم تشابه جنگل تصادفی

Background: The clustering of gene expression data plays an important role in the diagnosis and treatment of cancer. These kinds of data are typically involve in a large number of variables (genes), in comparison with number of samples (patients). Many clustering methods have been built based on the dissimilarity among observations that are calculated by a distance function. As increa...

متن کامل

The yeast protein-protein interaction map is a highly modular network with a staircase community structure

Summary: The construction of genome wide protein–protein interaction maps makes it feasible to study the global organization of proteins in a biological cell. Here the module organization of the protein–protein interaction network (PPIN) of budding yeast are investigated by Netwalk, an algorithm based on biased random (Brownian) walks. The yeast PPIN is a highly modular network, it has a modula...

متن کامل

Ecological Dissimilarity Analysis: A Simple Method of Demonstrating Community-Habitat Correlations for Frequency Data

We introduce an analysis method to demonstrate correlation between biota and the physical habitats that they occupy. Using the same calculations as does Nei’s genetic distance index, this method builds independent dissimilarity matrices for both habitat and fauna, which can then be compared in a common statistical framework. An important advantage of this method is that only frequency data are ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Physical review. E, Statistical, nonlinear, and soft matter physics

دوره 67 6 Pt 1  شماره 

صفحات  -

تاریخ انتشار 2003